Note: All exploratory data analysis done in this notebook will be based on the time period: 2013 to 2023
In this jupyter notebook, we will attempt to use data visualization tools to explore the relationships between features (both pre-defined by the original dataset and self-defined by us) and the corresponding HDB resale prices in Singapore. This data exploration process is done after Data Cleaning & Feature Engineering.
This is a complete and more detailed exploratory data analysis on the HDB Resale Price Dataset, of which only some features have been selected to be presented in the Final Report due to space constraints and importance of other aspects of the project.
After which, we will compare the trends across the two time periods in the Final Report.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import matplotlib as mpl
import seaborn as sns
sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
import random
%matplotlib inline
hdb_price_df_13_23: Pandas DataFrame object representing the features and information related to HDB flats in Singapore, for the time period — 2003 to 2012 (10 years)
hdb_price_df_13_23: Pandas DataFrame object representing the features and information related to HDB flats in Singapore, for the time period — 2013 to February 2023 (10 years & 2 months)
hdb_price_df: Pandas DataFrame object will be the combination of both DataFrames — hdb_price_df_13_23 & hdb_price_df_13_23
hdb_price_df_03_12 = pd.read_csv('hdb-price-with-features-2003to2012.csv').iloc[:, 1:]
hdb_price_df_13_23 = pd.read_csv('hdb-price-with-features-2013to2023.csv').iloc[:, 1:]
hdb_price_df_13_23.head()
| month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | remaining_lease | ... | mall_count_within_1km | mall_list_within_1km | school_shortest_dist | school_shortest_dist_name | good_sch_count_within_1km | good_sch_list_within_1km | good_sch_score | good_sch_score_sap_gep | distance_to_cbd_km | matured | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2023-03-01 | BISHAN | 4 ROOM | 289 | BISHAN ST 24 | 22 TO 24 | 103.0 | Premium Apartment | 1998 | 74.333333 | ... | 1 | ['Junction 8'] | 0.214393 | Catholic High School | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 |
| 1 | 2022-12-01 | BISHAN | 4 ROOM | 289 | BISHAN ST 24 | 22 TO 24 | 102.0 | Premium Apartment | 1998 | 74.583333 | ... | 1 | ['Junction 8'] | 0.214393 | Catholic High School | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 |
| 2 | 2022-11-01 | BISHAN | 4 ROOM | 289 | BISHAN ST 24 | 16 TO 18 | 102.0 | Premium Apartment | 1998 | 74.666667 | ... | 1 | ['Junction 8'] | 0.214393 | Catholic High School | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 |
| 3 | 2022-06-01 | BISHAN | 5 ROOM | 289 | BISHAN ST 24 | 07 TO 09 | 121.0 | Premium Apartment | 1998 | 75.083333 | ... | 1 | ['Junction 8'] | 0.214393 | Catholic High School | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 |
| 4 | 2021-12-01 | BISHAN | 4 ROOM | 289 | BISHAN ST 24 | 07 TO 09 | 102.0 | Premium Apartment | 1998 | 75.583333 | ... | 1 | ['Junction 8'] | 0.214393 | Catholic High School | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 |
5 rows × 30 columns
# new 'year' feature - just to check correctness of time frame
hdb_price_df_03_12['year'] = pd.DatetimeIndex(hdb_price_df_03_12['month']).year
print("Range of Years in \x1B[3mhdb_price_df_13_23\x1B[0m DataFrame:", np.sort(hdb_price_df_03_12['year'].unique()))
hdb_price_df_13_23['year'] = pd.DatetimeIndex(hdb_price_df_13_23['month']).year
print("Range of Years in \x1B[3mhdb_price_df_13_23\x1B[0m DataFrame:", np.sort(hdb_price_df_13_23['year'].unique()))
Range of Years in hdb_price_df_13_23 DataFrame: [2003 2004 2005 2006 2007 2008 2009 2010 2011 2012] Range of Years in hdb_price_df_13_23 DataFrame: [2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023]
adjusted_price: indicates the adjusted HDB Resale Price after accounting for inflation
# Read CPI data
cpi = pd.read_csv('monthly cpi.csv')
cpi = cpi.set_index('Data Series')
cpi_data = pd.DataFrame(cpi.columns, columns=['month'])
cpi_data['CPI'] = pd.to_numeric(np.array(cpi.loc['All Items']))
cpi_data = cpi_data.iloc[1: , :]
cpi_data['month'] = pd.to_datetime(cpi_data['month'])
# adjust prices for inflation using CPI
hdb_price_df_03_12['month'] = pd.to_datetime(hdb_price_df_03_12['month'])
hdb_price_df_03_12 = pd.merge(hdb_price_df_03_12, cpi_data, how = 'left')
hdb_price_df_03_12['adjusted_price'] = (hdb_price_df_03_12['resale_price'] / hdb_price_df_03_12['CPI']) * 100
hdb_price_df_13_23['month'] = pd.to_datetime(hdb_price_df_13_23['month'])
hdb_price_df_13_23 = pd.merge(hdb_price_df_13_23, cpi_data, how = 'left')
hdb_price_df_13_23['adjusted_price'] = (hdb_price_df_13_23['resale_price'] / hdb_price_df_13_23['CPI']) * 100
hdb_price_df_13_23.head()
| month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | remaining_lease | ... | school_shortest_dist_name | good_sch_count_within_1km | good_sch_list_within_1km | good_sch_score | good_sch_score_sap_gep | distance_to_cbd_km | matured | year | CPI | adjusted_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2023-03-01 | BISHAN | 4 ROOM | 289 | BISHAN ST 24 | 22 TO 24 | 103.0 | Premium Apartment | 1998 | 74.333333 | ... | Catholic High School | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 | 2023 | NaN | NaN |
| 1 | 2022-12-01 | BISHAN | 4 ROOM | 289 | BISHAN ST 24 | 22 TO 24 | 102.0 | Premium Apartment | 1998 | 74.583333 | ... | Catholic High School | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 | 2022 | 111.186 | 764484.737287 |
| 2 | 2022-11-01 | BISHAN | 4 ROOM | 289 | BISHAN ST 24 | 16 TO 18 | 102.0 | Premium Apartment | 1998 | 74.666667 | ... | Catholic High School | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 | 2022 | 110.959 | 683135.212105 |
| 3 | 2022-06-01 | BISHAN | 5 ROOM | 289 | BISHAN ST 24 | 07 TO 09 | 121.0 | Premium Apartment | 1998 | 75.083333 | ... | Catholic High School | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 | 2022 | 108.671 | 809783.658934 |
| 4 | 2021-12-01 | BISHAN | 4 ROOM | 289 | BISHAN ST 24 | 07 TO 09 | 102.0 | Premium Apartment | 1998 | 75.583333 | ... | Catholic High School | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 | 2021 | 104.439 | 579285.515947 |
5 rows × 33 columns
# also for the full dataset
hdb_price_df = pd.concat([hdb_price_df_03_12, hdb_price_df_13_23], axis=0)
print(hdb_price_df_03_12['month'].nunique() + hdb_price_df_13_23['month'].nunique() == hdb_price_df['month'].nunique())
True
hdb_price_df.head()
| month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | remaining_lease | ... | school_shortest_dist_name | good_sch_count_within_1km | good_sch_list_within_1km | good_sch_score | good_sch_score_sap_gep | distance_to_cbd_km | matured | year | CPI | adjusted_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2012-09-01 | BISHAN | 4 ROOM | 289 | BISHAN ST 24 | 07 TO 09 | 100.0 | Premium Apartment | 1998 | 85.0 | ... | Catholic High School | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 | 2012 | 97.298 | 657773.027195 |
| 1 | 2012-03-01 | BISHAN | 4 ROOM | 289 | BISHAN ST 24 | 01 TO 05 | 102.0 | Premium Apartment | 1998 | 85.0 | ... | Catholic High School | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 | 2012 | 95.207 | 630205.762181 |
| 2 | 2012-03-01 | BISHAN | 5 ROOM | 289 | BISHAN ST 24 | 11 TO 15 | 121.0 | Premium Apartment | 1998 | 85.0 | ... | Catholic High School | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 | 2012 | 95.207 | 766750.343987 |
| 3 | 2012-02-01 | BISHAN | 5 ROOM | 289 | BISHAN ST 24 | 07 TO 09 | 121.0 | Premium Apartment | 1998 | 85.0 | ... | Catholic High School | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 | 2012 | 94.491 | 756685.821930 |
| 4 | 2011-12-01 | BISHAN | 4 ROOM | 289 | BISHAN ST 24 | 07 TO 09 | 100.0 | Premium Apartment | 1998 | 86.0 | ... | Catholic High School | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 | 2011 | 93.877 | 637003.738935 |
5 rows × 33 columns
# sanity check of adjusted prices
random_index = random.randint(0, len(hdb_price_df)+1)
print((hdb_price_df.iloc[random_index]['adjusted_price'] in hdb_price_df_03_12['adjusted_price'].to_list()) or
(hdb_price_df.iloc[random_index]['adjusted_price'] in hdb_price_df_13_23['adjusted_price'].to_list()))
True
hdb_price_df_13_23.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 217191 entries, 0 to 217190 Data columns (total 33 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 month 217191 non-null datetime64[ns] 1 town 217191 non-null object 2 flat_type 217191 non-null object 3 block 217191 non-null object 4 street_name 217191 non-null object 5 storey_range 217191 non-null object 6 floor_area_sqm 217191 non-null float64 7 flat_model 217191 non-null object 8 lease_commence_date 217191 non-null int64 9 remaining_lease 217191 non-null float64 10 resale_price 217191 non-null float64 11 address 217191 non-null object 12 latitude 217191 non-null float64 13 longitude 217191 non-null float64 14 shortest_dist_mrt_name 217191 non-null object 15 shortest_dist_mrt 217191 non-null float64 16 mrt_count_within_1km 217191 non-null int64 17 mrt_list_within_1km 217191 non-null object 18 shortest_dist_mall_name 217191 non-null object 19 shortest_dist_mall 217191 non-null float64 20 mall_count_within_1km 217191 non-null int64 21 mall_list_within_1km 217191 non-null object 22 school_shortest_dist 217191 non-null float64 23 school_shortest_dist_name 217191 non-null object 24 good_sch_count_within_1km 217191 non-null int64 25 good_sch_list_within_1km 217191 non-null object 26 good_sch_score 217191 non-null int64 27 good_sch_score_sap_gep 217191 non-null int64 28 distance_to_cbd_km 217191 non-null float64 29 matured 217191 non-null int64 30 year 217191 non-null int64 31 CPI 214784 non-null float64 32 adjusted_price 214784 non-null float64 dtypes: datetime64[ns](1), float64(11), int64(8), object(13) memory usage: 56.3+ MB
# column names
hdb_price_df_13_23.columns
Index(['month', 'town', 'flat_type', 'block', 'street_name', 'storey_range',
'floor_area_sqm', 'flat_model', 'lease_commence_date',
'remaining_lease', 'resale_price', 'address', 'latitude', 'longitude',
'shortest_dist_mrt_name', 'shortest_dist_mrt', 'mrt_count_within_1km',
'mrt_list_within_1km', 'shortest_dist_mall_name', 'shortest_dist_mall',
'mall_count_within_1km', 'mall_list_within_1km', 'school_shortest_dist',
'school_shortest_dist_name', 'good_sch_count_within_1km',
'good_sch_list_within_1km', 'good_sch_score', 'good_sch_score_sap_gep',
'distance_to_cbd_km', 'matured', 'year', 'CPI', 'adjusted_price'],
dtype='object')
The table below summarizes the physical features of a flat that we will be conducting EDA with —
| No. | Feature Name | Values |
| :---| :----: | :---: |
| 1 | flat_type | 1 Room, 2 Room, ... , Executive, Multi-Generation|
| 2 | year | 2003, 2004, ... , 2012 |
| 3 | town | Bukit Timah, Woodlands, Dhoby Ghaut, ... |
| 4 | storey_range | 01 to 03, 03 to 05, ... , 40 to 42|
| 5 | floor_area_sqm | |
| 6 | blocknum | |
| 7 | flat_model | Simplified, New Generation, ..., Terrace, Premium Maisonette |
| 8 | lease_commence_date | 1966, 1967, ..., 2010, 2012|
| 9 | remaining_lease | |
print('Unique Flat Types from 2013 to 2023:\n',hdb_price_df_13_23['flat_type'].unique())
Unique Flat Types from 2013 to 2023: ['4 ROOM' '5 ROOM' '3 ROOM' 'EXECUTIVE' '2 ROOM' '1 ROOM' 'MULTI-GENERATION']
flat_count_13_23 = hdb_price_df_13_23.groupby(['flat_type']).size().reset_index()
flat_count_13_23.rename(columns={0: 'count'}, inplace=True)
flat_count_13_23['flat_type_proportion'] = (flat_count_13_23['count']/flat_count_13_23['count'].sum())*100
flat_count_13_23['flat_type_proportion'] = flat_count_13_23['flat_type_proportion'].round(1)
flat_count_13_23
| flat_type | count | flat_type_proportion | |
|---|---|---|---|
| 0 | 1 ROOM | 98 | 0.0 |
| 1 | 2 ROOM | 3035 | 1.4 |
| 2 | 3 ROOM | 54630 | 25.2 |
| 3 | 4 ROOM | 89434 | 41.2 |
| 4 | 5 ROOM | 53198 | 24.5 |
| 5 | EXECUTIVE | 16709 | 7.7 |
| 6 | MULTI-GENERATION | 87 | 0.0 |
fig, ax = plt.subplots(figsize=(13.33, 7.5), dpi=96)
# Grid
# ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
#ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)
# Set colors for bar graph
flat_type_proportion_highest = flat_count_13_23['flat_type_proportion'].max()
colors=['#193441']*len(flat_count_13_23)
for index, row in flat_count_13_23.iterrows():
if row['flat_type_proportion'] == flat_type_proportion_highest:
colors[index] = '#448c79'
else:
colors[index] = '#193441'
# Plot bars
bar1 = ax.bar(flat_count_13_23['flat_type'], flat_count_13_23['flat_type_proportion'], width=0.6, color=colors)
# Label x-axis and x ticks
ax.set_xlabel('', fontsize=12, labelpad=10) # No need for an axis label
ax.xaxis.set_label_position("bottom")
# Reformat y-axis
ax.set_ylabel('Percentage of All Flat Types (%)', fontsize=12, labelpad=10)
ax.yaxis.set_label_position("left")
ax.yaxis.set_major_formatter(lambda s, i : f'{s:,.0f}')
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
ax.yaxis.set_tick_params(pad=2, labeltop=False, labelbottom=True, bottom=False, labelsize=12)
# Add label on top of each bar
ax.bar_label(bar1, labels=[f'{e:,.0f}%' for e in flat_count_13_23['flat_type_proportion']], padding=3, color='black', fontsize=12)
# Remove spines
ax.spines[['top','left','bottom']].set_visible(False)
# Make right spine thicker
ax.spines['right'].set_linewidth(1.1)
# Add rectangle box and line (beautification purposes only)
ax.plot([0.08, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.08,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
ax.text(x=0.08, y=.93, s="2013 to 2023: Distribution of HDB Flat Types", transform=fig.transFigure, ha='left', fontsize=16, weight='bold', alpha=.8)
ax.text(x=0.08, y=.90, s="Bar Graph — Proportions of the different flat types available in Singapore HDB Market", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.85, wspace=None, hspace=None)
# Set a white background
fig.patch.set_facecolor('white')
Observations:
For the time period 2013 to 2023, 4 Room Flats take up the majority of flats in Singapore HDB market, accounting to more than 40% of all flat types. 3 Room and 4 Room takes up about 25% each. Followed by Executive Flat Types at 8%.
There are proportionately very few 1 room, 2 rooms and multi-generation flat types. As such, we will remove the following flat types for subsequent visualizations:
Comparing across both 10-year periods:
4 Room flats have increased in its proportion from 38% to 41% across, from time period 2003 to 2012 to time period 2013 to 2023. In the recent decade (i.e. 2013 to 2023), there is a more even proportion between 3 and 5 Room flats compared to the previous decade (i.e. 2003 to 2012). Executive Flat Types remains at a constant proportion of 8% of all Flat Types. 1 Room, 2 Room and Multi-Generation Flat Types remains almost negligble in terms of proportion across both decades. </font>
print('Unique flat types before removing:', hdb_price_df_03_12['flat_type'].unique(),'\n')
keep_flat_type = ['3 ROOM', '4 ROOM', '5 ROOM', 'EXECUTIVE']
hdb_price_df_03_12 = hdb_price_df_03_12[hdb_price_df_03_12['flat_type'].isin(keep_flat_type)]
print('Unique flat types after removing 1 rooms, 2 rooms and multi-generation flat types:',hdb_price_df_03_12['flat_type'].unique())
# do the same for 2013 to 2023
hdb_price_df_13_23 = hdb_price_df_13_23[hdb_price_df_13_23['flat_type'].isin(keep_flat_type)]
hdb_price_df = hdb_price_df[hdb_price_df['flat_type'].isin(keep_flat_type)]
Unique flat types before removing: ['4 ROOM' '5 ROOM' '3 ROOM' 'EXECUTIVE' '2 ROOM' '1 ROOM' 'MULTI-GENERATION'] Unique flat types after removing 1 rooms, 2 rooms and multi-generation flat types: ['4 ROOM' '5 ROOM' '3 ROOM' 'EXECUTIVE']
# create new feature flat_type_year
def get_year(x):
return str(x)[:4]
hdb_price_df_13_23['flat_type_year'] = hdb_price_df_13_23['flat_type'] + ' - ' + hdb_price_df_13_23['month'].apply(lambda x: str(x)[:4])
hdb_price_df_13_23.head()
| month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | remaining_lease | ... | good_sch_count_within_1km | good_sch_list_within_1km | good_sch_score | good_sch_score_sap_gep | distance_to_cbd_km | matured | year | CPI | adjusted_price | flat_type_year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2023-03-01 | BISHAN | 4 ROOM | 289 | BISHAN ST 24 | 22 TO 24 | 103.0 | Premium Apartment | 1998 | 74.333333 | ... | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 | 2023 | NaN | NaN | 4 ROOM - 2023 |
| 1 | 2022-12-01 | BISHAN | 4 ROOM | 289 | BISHAN ST 24 | 22 TO 24 | 102.0 | Premium Apartment | 1998 | 74.583333 | ... | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 | 2022 | 111.186 | 764484.737287 | 4 ROOM - 2022 |
| 2 | 2022-11-01 | BISHAN | 4 ROOM | 289 | BISHAN ST 24 | 16 TO 18 | 102.0 | Premium Apartment | 1998 | 74.666667 | ... | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 | 2022 | 110.959 | 683135.212105 | 4 ROOM - 2022 |
| 3 | 2022-06-01 | BISHAN | 5 ROOM | 289 | BISHAN ST 24 | 07 TO 09 | 121.0 | Premium Apartment | 1998 | 75.083333 | ... | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 | 2022 | 108.671 | 809783.658934 | 5 ROOM - 2022 |
| 4 | 2021-12-01 | BISHAN | 4 ROOM | 289 | BISHAN ST 24 | 07 TO 09 | 102.0 | Premium Apartment | 1998 | 75.583333 | ... | 1 | ['Catholic High School'] | 3 | 2 | 7.635584 | 1 | 2021 | 104.439 | 579285.515947 | 4 ROOM - 2021 |
5 rows × 34 columns
# visualize using joypy library
import joypy
# time block: 2004 to 2013
fig, ax = joypy.joyplot(hdb_price_df_13_23,
by='flat_type_year',
column='adjusted_price',
figsize=(12,8),
title='Distribution of Resale Prices based on Flat Type for 2013 to 2023',
linewidth=0.05,
overlap=1.5,
alpha=0.8,
colormap=plt.cm.get_cmap('tab20',4))
ax[-1].set_xlim([0, 1400000])
ax[-1].set_xticklabels(['0', '200k', '400k', '600k', '800k', '1000k', '1200k', '1400k'])
plt.xlabel('Resale Price SGD ($)', fontsize=14)
plt.show()
Observations from the above Ridgeline Plot:
Consistent pattern throughout the years that HDB flats with more rooms tend to fetch higher resale prices, which is unsurprising.
Looking at how prices vary for the same number of rooms (i.e. same flat type), HDB resale prices generally stay the same over the years for the same flat type from 2013 to 2023.
hdb_price_df_13_23['year'] = pd.DatetimeIndex(hdb_price_df_13_23['month']).year # extract out year
town_prices_13_23 = hdb_price_df_13_23.groupby(['town'], as_index=False).agg({'adjusted_price': 'median'}).sort_values('adjusted_price', ascending=True).reset_index(drop=True)
town_prices_13_23['adjusted_price'] = round(town_prices_13_23['adjusted_price']/1000) # convert prices in terms of SGD1000 for easier visualization since prices are all in '000s
town_prices_13_23
| town | adjusted_price | |
|---|---|---|
| 0 | ANG MO KIO | 370.0 |
| 1 | YISHUN | 371.0 |
| 2 | BUKIT BATOK | 384.0 |
| 3 | BEDOK | 389.0 |
| 4 | GEYLANG | 398.0 |
| 5 | WOODLANDS | 402.0 |
| 6 | JURONG EAST | 405.0 |
| 7 | TOA PAYOH | 414.0 |
| 8 | SEMBAWANG | 415.0 |
| 9 | JURONG WEST | 418.0 |
| 10 | CHOA CHU KANG | 420.0 |
| 11 | HOUGANG | 429.0 |
| 12 | CLEMENTI | 430.0 |
| 13 | BUKIT PANJANG | 434.0 |
| 14 | SENGKANG | 459.0 |
| 15 | TAMPINES | 468.0 |
| 16 | SERANGOON | 472.0 |
| 17 | PUNGGOL | 473.0 |
| 18 | MARINE PARADE | 474.0 |
| 19 | KALLANG/WHAMPOA | 494.0 |
| 20 | PASIR RIS | 496.0 |
| 21 | CENTRAL AREA | 542.0 |
| 22 | QUEENSTOWN | 593.0 |
| 23 | BUKIT MERAH | 602.0 |
| 24 | BISHAN | 613.0 |
| 25 | BUKIT TIMAH | 711.0 |
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)
lowest_median_price, highest_median_price = town_prices_13_23['adjusted_price'].min(), town_prices_13_23['adjusted_price'].max()
colors=['#e3dfda']*len(town_prices_13_23)
# set to lowest and highest prices to red and green respectively
for index, row in town_prices_13_23.iterrows():
if row['adjusted_price'] == lowest_median_price:
colors[index] = '#e69798'
if row['adjusted_price'] == highest_median_price:
colors[index] = '#448c79'
# Grid
ax.grid(which="major", axis='x', alpha = 0.5,color='#DAD8D7', zorder=1)
ax.grid(which="major", axis='y', alpha=0.5, color='#DAD8D7', zorder=1)
# Plot bars
bar1=ax.barh(town_prices_13_23['town'], town_prices_13_23['adjusted_price'], color=colors)
# Label x-axis and x ticks
ax.set_xlabel("Median Resale Prices (SGD'000)", fontsize=12, labelpad=10)
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)
# Add label on top of each bar
ax.bar_label(bar1, labels=[f'{e:,.0f}' for e in town_prices_13_23['adjusted_price']], padding=3, color='black', fontsize=10)
# Remove spines and make right spine thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)
# Add rectangle box and line (beautification purposes only)
ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Median HDB Resale Prices by Town", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Horizontal Bar Graph — Comparing the prices of HDB across different Towns", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)
plt.show()
Observations:
Bukit Timah is the most expensive town across 2013 to 2023. It has the highest median HDB Resale Price of SGD$711,000.
Ang Mo Kio has the lowest median resale price of SGD$370,000, indicating that it is the least expensive town across 2013 to 2023.
Comparing across both 10-year periods:
Bukit Timah remains the most expensive town across both decades, from 2003 all the way to 2023 and it has increased in its median price. </font>
town_prices_04_23 = hdb_price_df[hdb_price_df['year'].isin([2003,2023])].groupby(['town','year'], as_index=False).agg({'adjusted_price': 'median'})
town_prices_04_23['change'] = town_prices_04_23.groupby('town')['adjusted_price'].apply(lambda x: x.pct_change()*100).round() # calculate percentage change between 2004 and 2023
town_prices_04_23 = town_prices_04_23[town_prices_04_23['change'].notnull()]
town_prices_04_23 = town_prices_04_23.sort_values('change', ascending=True).reset_index(drop=True)
town_prices_04_23
| town | year | adjusted_price | change | |
|---|---|---|---|---|
| 0 | JURONG EAST | 2023 | 411540.705764 | 29.0 |
| 1 | PASIR RIS | 2023 | 570033.304308 | 40.0 |
| 2 | MARINE PARADE | 2023 | 471287.377577 | 45.0 |
| 3 | CHOA CHU KANG | 2023 | 475775.828793 | 47.0 |
| 4 | TAMPINES | 2023 | 502706.536083 | 48.0 |
| 5 | BEDOK | 2023 | 386006.804492 | 49.0 |
| 6 | SENGKANG | 2023 | 511683.438513 | 51.0 |
| 7 | BUKIT TIMAH | 2023 | 585742.883561 | 53.0 |
| 8 | HOUGANG | 2023 | 480264.280008 | 54.0 |
| 9 | SERANGOON | 2023 | 473980.448307 | 55.0 |
| 10 | SEMBAWANG | 2023 | 483855.040980 | 58.0 |
| 11 | BISHAN | 2023 | 662495.399338 | 61.0 |
| 12 | WOODLANDS | 2023 | 475775.828793 | 61.0 |
| 13 | GEYLANG | 2023 | 382864.888642 | 64.0 |
| 14 | BUKIT PANJANG | 2023 | 475775.828793 | 65.0 |
| 15 | ANG MO KIO | 2023 | 403960.609352 | 69.0 |
| 16 | JURONG WEST | 2023 | 457822.023932 | 69.0 |
| 17 | TOA PAYOH | 2023 | 457822.023932 | 80.0 |
| 18 | CENTRAL AREA | 2023 | 505798.181280 | 81.0 |
| 19 | YISHUN | 2023 | 421914.414212 | 84.0 |
| 20 | CLEMENTI | 2023 | 500462.310475 | 99.0 |
| 21 | KALLANG/WHAMPOA | 2023 | 534125.694588 | 111.0 |
| 22 | BUKIT BATOK | 2023 | 538614.145803 | 114.0 |
| 23 | BUKIT MERAH | 2023 | 658904.638365 | 160.0 |
| 24 | QUEENSTOWN | 2023 | 610429.365243 | 174.0 |
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)
smallest_price_change, largest_price_change = town_prices_04_23['change'].min(), town_prices_04_23['change'].max()
colors=['#e3dfda']*len(town_prices_04_23)
# set to lowest and highest prices to red and green respectively
for index, row in town_prices_04_23.iterrows():
if row['change'] == smallest_price_change:
colors[index] = '#e69798'
if row['change'] == largest_price_change:
colors[index] = '#448c79'
# Plot bars
bar1 = ax.barh(town_prices_04_23['town'], town_prices_04_23['change'], color=colors)
# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)
# Label x-axis and x ticks
ax.set_xlabel("Percentage Change in Median Resale Prices (%)", fontsize=13, labelpad=10)
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)
# Add label on top of each bar
ax.bar_label(bar1, labels=[f'{e:,.0f}' for e in town_prices_04_23['change']], padding=3, color='black', fontsize=11)
# Remove the spines and make the right spine thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)
# Add in red line and rectangle on top
ax.plot([0, 0.9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0,0.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
ax.text(x=0, y=.93, s="2003 vs 2023: Percentage Change of HDB Resale Prices by Town", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0, y=.91, s="Horizontal Bar Graph — How HDB resale prices changed in the last two decades, based on the town's median price", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)
plt.show()
Observations:
Comparing the percentage change in median HDB resale prices between 2003 and this year, 2023, all the towns in Singapore has generally increased in its median HDB resale prices.
Queenstown has the highest percentage change in its resale prices (157% which is more than double) while Jurong East has the lowest increase in resale prices, at only 29%.
town_prices_22_23 = hdb_price_df[hdb_price_df['year'].isin([2022,2023])].groupby(['town','year'], as_index=False).agg({'adjusted_price': 'median'})
town_prices_22_23
| town | year | adjusted_price | |
|---|---|---|---|
| 0 | ANG MO KIO | 2022 | 395947.680293 |
| 1 | ANG MO KIO | 2023 | 403960.609352 |
| 2 | BEDOK | 2022 | 410962.607810 |
| 3 | BEDOK | 2023 | 386006.804492 |
| 4 | BISHAN | 2022 | 643169.539491 |
| 5 | BISHAN | 2023 | 662495.399338 |
| 6 | BUKIT BATOK | 2022 | 450218.677643 |
| 7 | BUKIT BATOK | 2023 | 538614.145803 |
| 8 | BUKIT MERAH | 2022 | 624537.762297 |
| 9 | BUKIT MERAH | 2023 | 658904.638365 |
| 10 | BUKIT PANJANG | 2022 | 467684.780458 |
| 11 | BUKIT PANJANG | 2023 | 475775.828793 |
| 12 | BUKIT TIMAH | 2022 | 659954.263717 |
| 13 | BUKIT TIMAH | 2023 | 585742.883561 |
| 14 | CENTRAL AREA | 2022 | 555236.976962 |
| 15 | CENTRAL AREA | 2023 | 505798.181280 |
| 16 | CHOA CHU KANG | 2022 | 478633.431848 |
| 17 | CHOA CHU KANG | 2023 | 475775.828793 |
| 18 | CLEMENTI | 2022 | 489400.846482 |
| 19 | CLEMENTI | 2023 | 500462.310475 |
| 20 | GEYLANG | 2022 | 419613.518655 |
| 21 | GEYLANG | 2023 | 382864.888642 |
| 22 | HOUGANG | 2022 | 471274.889205 |
| 23 | HOUGANG | 2023 | 480264.280008 |
| 24 | JURONG EAST | 2022 | 418703.294103 |
| 25 | JURONG EAST | 2023 | 411540.705764 |
| 26 | JURONG WEST | 2022 | 464151.158107 |
| 27 | JURONG WEST | 2023 | 457822.023932 |
| 28 | KALLANG/WHAMPOA | 2022 | 624148.542663 |
| 29 | KALLANG/WHAMPOA | 2023 | 534125.694588 |
| 30 | MARINE PARADE | 2022 | 449443.230573 |
| 31 | MARINE PARADE | 2023 | 471287.377577 |
| 32 | PASIR RIS | 2022 | 560912.222812 |
| 33 | PASIR RIS | 2023 | 570033.304308 |
| 34 | PUNGGOL | 2022 | 511008.800389 |
| 35 | PUNGGOL | 2023 | 520660.340943 |
| 36 | QUEENSTOWN | 2022 | 644864.420804 |
| 37 | QUEENSTOWN | 2023 | 610429.365243 |
| 38 | SEMBAWANG | 2022 | 484866.567838 |
| 39 | SEMBAWANG | 2023 | 483855.040980 |
| 40 | SENGKANG | 2022 | 501448.107151 |
| 41 | SENGKANG | 2023 | 511683.438513 |
| 42 | SERANGOON | 2022 | 495746.454346 |
| 43 | SERANGOON | 2023 | 473980.448307 |
| 44 | TAMPINES | 2022 | 502917.480831 |
| 45 | TAMPINES | 2023 | 502706.536083 |
| 46 | TOA PAYOH | 2022 | 489653.518975 |
| 47 | TOA PAYOH | 2023 | 457822.023932 |
| 48 | WOODLANDS | 2022 | 466314.696363 |
| 49 | WOODLANDS | 2023 | 475775.828793 |
| 50 | YISHUN | 2022 | 427030.053426 |
| 51 | YISHUN | 2023 | 421914.414212 |
town_prices_22_23 = hdb_price_df[hdb_price_df['year'].isin([2021,2022])].groupby(['town','year'], as_index=False).agg({'adjusted_price': 'median'})
town_prices_22_23['change'] = town_prices_22_23.groupby('town')['adjusted_price'].apply(lambda x: x.pct_change()*100).round() # calculate percentage change between 2004 and 2023
town_prices_22_23 = town_prices_22_23[town_prices_22_23['change'].notnull()]
town_prices_22_23 = town_prices_22_23.sort_values('change', ascending=True).reset_index(drop=True).reset_index()
town_prices_22_23
| index | town | year | adjusted_price | change | |
|---|---|---|---|---|---|
| 0 | 0 | CENTRAL AREA | 2022 | 555236.976962 | -16.0 |
| 1 | 1 | GEYLANG | 2022 | 419613.518655 | -13.0 |
| 2 | 2 | QUEENSTOWN | 2022 | 644864.420804 | -7.0 |
| 3 | 3 | CLEMENTI | 2022 | 489400.846482 | -7.0 |
| 4 | 4 | MARINE PARADE | 2022 | 449443.230573 | -6.0 |
| 5 | 5 | JURONG EAST | 2022 | 418703.294103 | -5.0 |
| 6 | 6 | BEDOK | 2022 | 410962.607810 | -4.0 |
| 7 | 7 | BUKIT TIMAH | 2022 | 659954.263717 | -4.0 |
| 8 | 8 | SERANGOON | 2022 | 495746.454346 | -1.0 |
| 9 | 9 | BUKIT MERAH | 2022 | 624537.762297 | -1.0 |
| 10 | 10 | BUKIT PANJANG | 2022 | 467684.780458 | -1.0 |
| 11 | 11 | TAMPINES | 2022 | 502917.480831 | 1.0 |
| 12 | 12 | YISHUN | 2022 | 427030.053426 | 1.0 |
| 13 | 13 | HOUGANG | 2022 | 471274.889205 | 1.0 |
| 14 | 14 | CHOA CHU KANG | 2022 | 478633.431848 | 2.0 |
| 15 | 15 | TOA PAYOH | 2022 | 489653.518975 | 2.0 |
| 16 | 16 | PASIR RIS | 2022 | 560912.222812 | 3.0 |
| 17 | 17 | BISHAN | 2022 | 643169.539491 | 3.0 |
| 18 | 18 | JURONG WEST | 2022 | 464151.158107 | 4.0 |
| 19 | 19 | PUNGGOL | 2022 | 511008.800389 | 4.0 |
| 20 | 20 | SENGKANG | 2022 | 501448.107151 | 5.0 |
| 21 | 21 | WOODLANDS | 2022 | 466314.696363 | 6.0 |
| 22 | 22 | ANG MO KIO | 2022 | 395947.680293 | 7.0 |
| 23 | 23 | SEMBAWANG | 2022 | 484866.567838 | 8.0 |
| 24 | 24 | BUKIT BATOK | 2022 | 450218.677643 | 8.0 |
| 25 | 25 | KALLANG/WHAMPOA | 2022 | 624148.542663 | 19.0 |
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)
# smallest_price_change, largest_price_change = town_prices_22_23['change'].min(), town_prices_22_23['change'].max()
colors=['#e3dfda']*len(town_prices_22_23)
# set to lowest and highest prices to red and green respectively
for index, row in town_prices_22_23.iterrows():
if row['change'] < 0:
colors[index] = '#e69798'
if row['change'] > 0:
colors[index] = '#448c79'
# Plot bars
bar1 = ax.barh(town_prices_22_23['town'], town_prices_22_23['change'], color=colors)
# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)
# Label x-axis and x ticks
ax.set_xlabel("Percentage Change in Median Resale Prices (%)", fontsize=13, labelpad=10)
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)
# Add label on top of each bar
ax.bar_label(bar1, labels=[f'{e:,.0f}' for e in town_prices_22_23['change']], padding=3, color='black', fontsize=11)
# Remove the spines
ax.spines[['top','left','bottom']].set_visible(False)
# Make the right spine thicker
ax.spines['right'].set_linewidth(1.1)
# Add in red line and rectangle on top
ax.plot([0, 0.9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0,0.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
ax.text(x=0, y=.93, s="2022 vs 2023: Percentage Change of HDB Resale Prices by Town", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0, y=.91, s="Horizontal Bar Graph — How HDB resale prices changed in the past two years, based on the town's median price", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)
plt.show()
Observations:
The changes are not very significant from 2022 to 2023, generally increase/decrease by about 10%.
Prices have dropped the most (16%) in Central Area while on the other hand, prices in Kallang/Whampoa have increased the most, by 19%.
Could it be because these areas have older HDB meaning their lease are now shorter? As shown below, it seems that places like Punggol, and Sengkang, tend to have later lease commence date, as they were developed later, which might have led to their slight increase in prices, while places like Toa Payoh and Central Area, tend to have older lease commence date.
hdb_price_df[hdb_price_df['year'].isin([2022,2023])].groupby('town')['lease_commence_date'].median().sort_values()
town MARINE PARADE 1975.0 ANG MO KIO 1980.0 BEDOK 1980.0 CLEMENTI 1981.0 GEYLANG 1982.0 TOA PAYOH 1984.0 CENTRAL AREA 1984.0 JURONG EAST 1984.0 SERANGOON 1986.0 BUKIT BATOK 1987.0 YISHUN 1988.0 BISHAN 1988.0 TAMPINES 1988.0 BUKIT TIMAH 1988.0 HOUGANG 1991.0 PASIR RIS 1993.0 BUKIT MERAH 1996.0 KALLANG/WHAMPOA 1996.0 JURONG WEST 1997.0 WOODLANDS 1998.0 BUKIT PANJANG 1999.0 CHOA CHU KANG 1999.0 QUEENSTOWN 2003.0 SEMBAWANG 2004.0 SENGKANG 2012.0 PUNGGOL 2014.0 Name: lease_commence_date, dtype: float64
# Storey Prices for 2013 to 2023
storey_13_23 = hdb_price_df_13_23.groupby('storey_range')['adjusted_price'].median().reset_index().sort_values(by='storey_range')
storey_13_23['storey_rank'] = storey_13_23['storey_range'].astype('category').cat.codes + 1 # label encode storey -> give each storey range a rank
storey_13_23
| storey_range | adjusted_price | storey_rank | |
|---|---|---|---|
| 0 | 01 TO 03 | 3.984660e+05 | 1 |
| 1 | 04 TO 06 | 4.151394e+05 | 2 |
| 2 | 07 TO 09 | 4.289781e+05 | 3 |
| 3 | 10 TO 12 | 4.398682e+05 | 4 |
| 4 | 13 TO 15 | 4.781860e+05 | 5 |
| 5 | 16 TO 18 | 5.133940e+05 | 6 |
| 6 | 19 TO 21 | 6.043963e+05 | 7 |
| 7 | 22 TO 24 | 6.381870e+05 | 8 |
| 8 | 25 TO 27 | 7.081029e+05 | 9 |
| 9 | 28 TO 30 | 7.865138e+05 | 10 |
| 10 | 31 TO 33 | 8.152569e+05 | 11 |
| 11 | 34 TO 36 | 8.361254e+05 | 12 |
| 12 | 37 TO 39 | 8.441096e+05 | 13 |
| 13 | 40 TO 42 | 8.885473e+05 | 14 |
| 14 | 43 TO 45 | 9.452847e+05 | 15 |
| 15 | 46 TO 48 | 1.031838e+06 | 16 |
| 16 | 49 TO 51 | 1.097620e+06 | 17 |
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)
# Set colors
colors = ['#193441']*len(storey_13_23)
for index, row in storey_13_23.iterrows():
if row['adjusted_price'] > 700000:
colors[index] = '#e69798'
scatter1=sns.scatterplot(x=storey_13_23['storey_range'], y=storey_13_23['adjusted_price'], s=storey_13_23['storey_rank'].astype('int')*30, sizes=(20,200),color=colors, edgecolors='w', alpha=0.5, ax=ax)
# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)
# Label data points
for i, row in storey_13_23.iterrows():
# offset = row['storey_rank'] / 50
label = str(round(row['adjusted_price']/1000)) + 'K'
plt.annotate(label, xy=(row['storey_range'], row['adjusted_price']), ha='center', va='bottom', xytext=(0, 14), textcoords='offset points', fontsize=11)
# Label x-axis and x ticks
ax.set_xlabel("Percentage Change in Median Resale Prices (%)", fontsize=12, labelpad=10)
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)
plt.xticks(rotation=45, fontsize=10)
# Reformat y-axis
ylabels = ['{:,.0f}'.format(x) + 'K' for x in scatter1.get_yticks()/1000]
scatter1.set_yticklabels(ylabels)
scatter1.set_ylim([300000,1200000])
scatter1.set_ylabel('Resale Price SGD ($)', size=15)
scatter1.set_xlabel('Storey', size=15)
# Remove the spines and make right spine thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)
ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Storey and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Bubble Scatterplot — How HDB resale prices are associated with floor level, based on the median price of each storey range", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)
plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/628486241.py:28: UserWarning: FixedFormatter should only be used together with FixedLocator scatter1.set_yticklabels(ylabels)
Observations:
General trend that as floor level increases, the median HDB resale prices are also associated with an increase.
Unlike the dataset from 2003 to 2012, the X-axis labels are already well-labelled in the 2013 to 2023 dataset, hence there is no need for any adjustments.
Comparing across both 10-year periods:
For 2003 to 2012, at the high floors, the HDB Resale Prices tend to plateau and does not increase as much compared to the middle floors. However, for 2013 to 2023, HDB Resale Prices continuously increase as floor level increases, regardless of how high the floor gets. In fact, above 37th floor, the rate of increase of prices is still increasing, and HDB Flats in those floors are fetching close to or more than SGD$1 Million. </font>
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)
# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)
p=sns.regplot(x='floor_area_sqm', y='adjusted_price', data=hdb_price_df_13_23, color='#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})
ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
p.set_yticklabels(ylabels)
p.set_ylabel('Resale Price SGD ($)', size=15)
p.set_xlabel('Floor Area (Square Meters)', size=15)
# Remove the spines and make right thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)
ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Floor Area and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Scatterplot & Regression Line — How HDB resale prices are associated with floor area (in sqm)", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/4232059774.py:9: UserWarning: FixedFormatter should only be used together with FixedLocator p.set_yticklabels(ylabels)
Observations:
As floor area of a HDB resale flat increases, its resale price is associated with an increase as well. Majority of transacted HDB resale flats have a floor area of between 50 and 200 square metres.
Comparing across both 10-year periods:
Not much visible difference between the steepness of graphs showing the relationship between Floor Area and HDB Resale Prices across both decades.
</font>
floor_area_13_23 = hdb_price_df_13_23.groupby(['floor_area_sqm'], as_index=False).agg({'adjusted_price':'median'})
floor_area_13_23
| floor_area_sqm | adjusted_price | |
|---|---|---|
| 0 | 49.0 | 3.138981e+05 |
| 1 | 51.0 | 2.483005e+05 |
| 2 | 52.0 | 2.527525e+05 |
| 3 | 53.0 | 2.531492e+05 |
| 4 | 54.0 | 2.964794e+05 |
| ... | ... | ... |
| 154 | 243.0 | 1.034958e+06 |
| 155 | 249.0 | 1.062119e+06 |
| 156 | 259.0 | 1.163332e+06 |
| 157 | 266.0 | 1.050334e+06 |
| 158 | 280.0 | 1.065873e+06 |
159 rows × 2 columns
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)
# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)
p=sns.regplot(x='floor_area_sqm', y='adjusted_price', data=floor_area_13_23, color='#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})
ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
p.set_yticklabels(ylabels)
p.set_ylabel('Resale Price SGD ($)', size=15)
p.set_xlabel('Floor Area (Square Meters)', size=15)
# Remove the spines and make right thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)
ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Floor Area and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Scatterplot & Regression Line — using Median HDB Resale Price for each Floor Area", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/574426306.py:9: UserWarning: FixedFormatter should only be used together with FixedLocator p.set_yticklabels(ylabels)
# what are those HDBs which have bigger than 200 sqm of floor area
above_200_sqm_df = hdb_price_df_13_23[hdb_price_df_13_23['floor_area_sqm'] > 200].reset_index()
above_200_sqm_df.groupby('flat_model')['index'].nunique()
flat_model Maisonette 2 Premium Maisonette 2 Terrace 10 Name: index, dtype: int64
Observations:
Trend is similar to without taking median prices for each floor area; previous visualization.
Flat Models of HDBs which have larger than 200 sqm of floor area are mostly Terrace, which is not surprising.
Comparing across both 10-year periods:
Not much visible difference between the steepness of graphs showing the relationship between Floor Area and HDB Resale Prices across both decades.
</font>
3 digit system was introduced in the 1970s, with the 1st digit representing a neighbourhood in a town. So for e.g., AMK neighbourhood 1 starts with 101, and AMK neighbourhood 2 starts with 201. So first digit was separated from last 2 digits and plotted separately
import re
# Block Number Prices
get_num = lambda x: int(re.findall("\d+", x)[0])
hdb_price_df_13_23['blocknum'] = hdb_price_df_13_23['block'].apply(get_num) # get only digits from block number
tmp = hdb_price_df_13_23[hdb_price_df_13_23['blocknum'] > 99] # get only blocks that use 3-digit numbering system
tmp = tmp.groupby('blocknum')['adjusted_price'].median().reset_index()
fig, ax = plt.subplots(figsize=(20, 13), dpi=96)
ax1 = plt.subplot(121)
a=sns.scatterplot(x=tmp['blocknum'].apply(lambda x: int(str(x)[0])), y=tmp['adjusted_price'], color='#193441', edgecolors='w', alpha=0.9)
ylabels = ['{:,.0f}'.format(x) + 'K' for x in a.get_yticks()/1000]
ax1.set_yticklabels(ylabels)
ax1.set_ylabel('Resale Price ($SGD)', size=15), ax1.set_xlabel('Neighbourhood Number', size=15)
ax2 = plt.subplot(122)
b=sns.scatterplot(x=tmp['blocknum'].apply(lambda x: int(str(x)[1:])), y=tmp['adjusted_price'], color = '#e69798',edgecolors='w', alpha=0.9)
ax2.set_yticklabels(ylabels)
ax2.set_ylabel('', size=15)
ax2.set_xlabel('Block Number', size=15)
# Add in line and rectangle on top
plt.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax1.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
plt.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Neighbourhood/Block Number and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
plt.text(x=0.05, y=.91, s="Scatterplot — How (if any) HDB Resale Price is associated with Neighbourhood/Block Number", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/2226950048.py:14: UserWarning: FixedFormatter should only be used together with FixedLocator ax1.set_yticklabels(ylabels) /var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/2226950048.py:19: UserWarning: FixedFormatter should only be used together with FixedLocator ax2.set_yticklabels(ylabels)
Observations:
Neighbourhood/Block Number does not seem to have any association with HDB Resale Prices.
Comparing across both 10-year periods:
Not much visible difference in the scatter plots across both decades.
</font>
fig, ax = plt.subplots(figsize=(12, 14), dpi=96)
# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)
p=sns.violinplot(x='adjusted_price',
y='flat_model',
data=hdb_price_df_13_23,
width=1,
orient='h', # horizontal plot
order=hdb_price_df_13_23.groupby('flat_model')['adjusted_price'].median().sort_values().reset_index()['flat_model'].tolist(),
palette='Set3')
# Label x-axis and x ticks
p.set_xlabel("Resale Price (SGD$)", fontsize=14, labelpad=10)
xlabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_xticks()/1000]
p.set_xticklabels(xlabels)
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)
p.set_xlim([0,1500000])
# Reformat y-axis
# ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
# p.set_yticklabels(ylabels)
p.set_ylabel("Flat Model", size=15)
# p.set_ylim([50000,1300000])
# Remove the spines and make right spine thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)
ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
ax.text(x=0.05, y=.94, s="2013 to 2023: Relationship between Flat Models and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.92, s="Violin Boxplot — Distribution of HDB Resale Prices across different Flat Models", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)
plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/292884573.py:18: UserWarning: FixedFormatter should only be used together with FixedLocator p.set_xticklabels(xlabels)
Observations:
Flat Models like Premium Apartment Loft, Type S1 and Type S2 have relatively high median resale prices, compared to Simplified, New Generation and Standard flat models. Improved Flat Model has the largest range.
Comparing across both 10-year periods:
Not much visible difference between the violin plot distributions showing the relationship between Flat Model and HDB Resale Prices across both decades.
</font>
x = np.array((hdb_price_df_13_23['adjusted_price'] < 0).to_list())
np.unique(x)
array([False])
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)
# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)
a=sns.boxplot(y='lease_commence_date', x='adjusted_price', data=hdb_price_df_13_23, width=1, orient='h', flierprops = dict(markerfacecolor = 'red', markersize = 0.1, linestyle='none'), linewidth=0.4)
ax.set_xlabel('Resale Price ($SGD)', size=15)
xlabels = ['{:,.0f}'.format(x) + 'K' for x in a.get_xticks()/1000]
ax.set_xticklabels(xlabels)
p.set_xlim([0,1300000])
ax.set_ylabel('Lease Commence Year', size=15)
# Remove the spines and make right spine thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)
ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Lease Commence (Year) and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Boxplot — Distribution of Resale Prices amoung HDB Flats with different years of Lease Commencement", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)
plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/4102906681.py:10: UserWarning: FixedFormatter should only be used together with FixedLocator ax.set_xticklabels(xlabels)
Observations:
HDB Flats with earlier Lease Commence Dates generally fetch lower Median Resale Prices compared to those that have more recent Lease Commence Dates. However, HDB Flats with 2011 as their Lease Commence Date has the highest Median Resale Prices, even HDB Flats that are newer (i.e. Lease Commence Date later than 2011) have lower Median Resale Prices.
Comparing across both 10-year periods:
Not much visible difference between the boxplot distributions showing the relationship between Lease Commence and HDB Resale Prices across both decades.
</font>
hdb_price_df['remaining_lease'].unique()
array([ 85. , 86. , 87. , 88. ,
89. , 90. , 91. , 92. ,
93. , 94. , 70. , 71. ,
72. , 73. , 74. , 75. ,
76. , 77. , 78. , 79. ,
80. , 81. , 82. , 83. ,
84. , 65. , 66. , 67. ,
68. , 69. , 95. , 96. ,
54. , 55. , 56. , 57. ,
58. , 59. , 60. , 61. ,
62. , 63. , 64. , 97. ,
98. , 99. , 100. , 101. ,
74.33333333, 74.58333333, 74.66666667, 75.08333333,
75.58333333, 76.5 , 76.66666667, 76.75 ,
79.08333333, 80.16666667, 59.41666667, 60.16666667,
60.33333333, 60.91666667, 60.83333333, 61.08333333,
61.25 , 61.41666667, 61.66666667, 62.33333333,
62.5 , 62.75 , 63.33333333, 63.5 ,
63.58333333, 64.33333333, 65.16666667, 65.33333333,
65.5 , 62.91666667, 63.25 , 63.66666667,
64.16666667, 66.83333333, 67.33333333, 67.58333333,
94.75 , 94.91666667, 64.08333333, 64.66666667,
64.75 , 64.91666667, 65.41666667, 65.66666667,
65.58333333, 66.33333333, 66.58333333, 68.33333333,
61.91666667, 63.41666667, 63.91666667, 64.41666667,
65.25 , 65.83333333, 66.5 , 66.91666667,
60.66666667, 66.66666667, 64.25 , 64.83333333,
67.5 , 67.75 , 67.83333333, 67.91666667,
68.16666667, 68.41666667, 68.83333333, 69.41666667,
63.75 , 63.83333333, 64.58333333, 67.08333333,
67.25 , 67.16666667, 68.5 , 68.58333333,
68.66666667, 59.16666667, 59.91666667, 60.08333333,
63.08333333, 64.5 , 54.25 , 55.5 ,
55.58333333, 56.08333333, 56.25 , 56.33333333,
56.66666667, 58.33333333, 59.66666667, 59.75 ,
62.41666667, 62.58333333, 63.16666667, 57.66666667,
57.75 , 58.16666667, 58.25 , 58.75 ,
58.66666667, 59.58333333, 60.41666667, 61.33333333,
61.83333333, 62.16666667, 62.66666667, 66.16666667,
66.75 , 67.41666667, 68.75 , 54.91666667,
55.25 , 55.66666667, 56.16666667, 57.16666667,
58.91666667, 60.58333333, 62.25 , 65.08333333,
66.41666667, 69.08333333, 75.66666667, 76.16666667,
76.41666667, 76.83333333, 77.08333333, 77.16666667,
77.33333333, 77.41666667, 78.25 , 78.5 ,
78.58333333, 78.75 , 79.41666667, 79.33333333,
79.75 , 80.25 , 81.08333333, 81.25 ,
81.33333333, 81.41666667, 81.5 , 80.08333333,
80.41666667, 81.16666667, 82.16666667, 82.25 ,
82.33333333, 82.66666667, 83.75 , 83.83333333,
84.33333333, 84.5 , 84.75 , 84.83333333,
84.91666667, 85.08333333, 85.58333333, 85.75 ,
85.83333333, 75.75 , 76.08333333, 76.25 ,
77.5 , 77.66666667, 77.83333333, 77.91666667,
78.33333333, 79.25 , 80.75 , 81.58333333,
69.33333333, 69.75 , 69.91666667, 70.16666667,
70.33333333, 71.91666667, 71.83333333, 72.83333333,
73.75 , 74.16666667, 74.41666667, 69.25 ,
70.91666667, 73.66666667, 75.91666667, 75.83333333,
77.25 , 77.75 , 79.58333333, 79.91666667,
81.83333333, 61.16666667, 66.25 , 69.83333333,
92.08333333, 92.58333333, 92.66666667, 92.91666667,
93.16666667, 93.58333333, 93.83333333, 93.91666667,
94.16666667, 94.33333333, 94.25 , 94.41666667,
94.5 , 94.58333333, 94.66666667, 95.16666667,
95.25 , 92.33333333, 92.41666667, 92.75 ,
93.08333333, 94.08333333, 76.91666667, 78.08333333,
78.16666667, 78.83333333, 78.91666667, 79.16666667,
79.66666667, 68.08333333, 65.91666667, 69.58333333,
70.08333333, 70.83333333, 71.16666667, 71.33333333,
62.83333333, 87.83333333, 87.91666667, 88.08333333,
88.25 , 89.08333333, 89.33333333, 89.5 ,
89.75 , 90.33333333, 90.41666667, 91.33333333,
92.16666667, 93.25 , 93.41666667, 93.66666667,
55.41666667, 55.91666667, 56.41666667, 56.5 ,
56.58333333, 57.91666667, 59.33333333, 59.25 ,
55.08333333, 55.75 , 55.83333333, 58.58333333,
73.91666667, 74.5 , 74.83333333, 75.33333333,
75.41666667, 81.91666667, 82.08333333, 82.41666667,
82.5 , 82.83333333, 83.66666667, 84.41666667,
84.66666667, 73.16666667, 73.41666667, 73.5 ,
73.58333333, 73.83333333, 75.5 , 76.58333333,
77.58333333, 78.66666667, 69.66666667, 70.58333333,
70.66666667, 71.08333333, 71.25 , 71.58333333,
74.25 , 74.75 , 93.75 , 94.83333333,
95.08333333, 88.41666667, 88.58333333, 88.5 ,
89.16666667, 89.83333333, 90.16666667, 90.5 ,
91.16666667, 91.58333333, 92.25 , 93.5 ,
73.25 , 74.91666667, 75.25 , 83.16666667,
83.25 , 83.33333333, 83.41666667, 83.58333333,
84.16666667, 80.5 , 80.66666667, 80.91666667,
81.75 , 82.91666667, 84.08333333, 85.25 ,
71.5 , 71.66666667, 72.5 , 72.41666667,
72.58333333, 56.83333333, 59.08333333, 59.83333333,
60.25 , 61.5 , 61.75 , 62.08333333,
57.08333333, 57.33333333, 61.58333333, 56.75 ,
57.25 , 57.41666667, 57.5 , 57.58333333,
57.83333333, 60.5 , 54.5 , 54.83333333,
55.16666667, 58.08333333, 58.5 , 54.58333333,
59.5 , 60.75 , 44.16666667, 44.25 ,
44.41666667, 44.66666667, 44.83333333, 44.91666667,
46.58333333, 46.75 , 47. , 48.08333333,
48.5 , 48.58333333, 48.91666667, 49. ,
50. , 52. , 53. , 45.5 ,
47.08333333, 47.91666667, 48.75 , 51. ,
67.66666667, 45.08333333, 45.16666667, 45.58333333,
45.91666667, 46.08333333, 46.16666667, 46.5 ,
46.41666667, 47.33333333, 47.83333333, 47.75 ,
48. , 49.5 , 49.91666667, 50.5 ,
50.66666667, 50.83333333, 45.33333333, 45.66666667,
46.33333333, 47.16666667, 48.16666667, 49.08333333,
49.58333333, 49.75 , 49.83333333, 46.66666667,
46.91666667, 47.5 , 49.41666667, 49.33333333,
49.66666667, 53.25 , 53.33333333, 54.08333333,
54.41666667, 56.91666667, 58.41666667, 65.75 ,
93.33333333, 95.41666667, 71.41666667, 72.75 ,
66.08333333, 47.66666667, 48.33333333, 50.08333333,
50.33333333, 50.41666667, 50.75 , 51.33333333,
51.58333333, 51.75 , 51.83333333, 52.08333333,
52.25 , 52.66666667, 52.91666667, 72.91666667,
78.41666667, 79.83333333, 82.58333333, 83.08333333,
83.91666667, 79.5 , 80.33333333, 80.58333333,
80.83333333, 81.66666667, 84.25 , 50.16666667,
50.58333333, 51.91666667, 52.16666667, 52.41666667,
52.5 , 52.58333333, 53.5 , 53.91666667,
54.33333333, 54.75 , 58.83333333, 88.83333333,
88.91666667, 89.58333333, 90.58333333, 90.83333333,
91.5 , 92.5 , 86.91666667, 87.5 ,
87.58333333, 87.66666667, 87.75 , 88.16666667,
88.33333333, 88.75 , 89.41666667, 89.91666667,
90.08333333, 90.25 , 90.66666667, 90.75 ,
90.91666667, 91.66666667, 91.75 , 91.83333333,
92.83333333, 82.75 , 84.58333333, 85.5 ,
86.25 , 86.75 , 87.08333333, 87.16666667,
87.25 , 87.41666667, 85.33333333, 85.41666667,
85.66666667, 50.91666667, 52.33333333, 52.75 ,
53.08333333, 53.16666667, 72.33333333, 71.75 ,
72.25 , 70.5 , 72.66666667, 73.33333333,
95.33333333, 95.5 , 95.91666667, 74.08333333,
48.66666667, 48.83333333, 49.25 , 50.25 ,
51.08333333, 51.41666667, 51.5 , 52.83333333,
53.66666667, 53.83333333, 48.41666667, 51.16666667,
53.41666667, 53.75 , 91.91666667, 95.75 ,
95.66666667, 42.66666667, 44.08333333, 46.25 ,
91.25 , 91.41666667, 88.66666667, 91.08333333,
70.41666667, 72.08333333, 68.25 , 70.25 ,
53.58333333, 69.5 , 55.33333333, 51.25 ,
51.66666667, 54.16666667, 68.91666667, 69.16666667,
54.66666667, 83.5 , 85.16666667, 86.16666667,
86.33333333, 86.66666667, 85.91666667, 86.41666667,
86.5 , 86.58333333, 89.25 , 87.33333333,
89.66666667, 95.58333333, 73.08333333, 75.16666667,
46.83333333, 86.83333333, 86.08333333, 76.33333333,
70.75 , 47.58333333, 48.25 , 45.41666667,
49.16666667, 43.33333333, 43.66666667, 43.75 ,
44.75 , 46. , 45. , 45.75 ,
95.83333333, 96.58333333, 72.16666667, 96.33333333,
96.75 , 47.25 , 47.41666667, 45.25 ,
45.83333333, 43.41666667, 43.58333333, 43.83333333,
44.58333333, 96.16666667, 96.5 , 44.33333333,
96.08333333, 43.08333333, 43.25 , 43.5 ,
43. , 44.5 , 44. , 96.66666667,
97.16666667, 96.25 , 96.41666667, 43.16666667,
43.91666667, 97.25 , 97.41666667, 96.91666667,
97.33333333, 97.08333333, 97.75 , 96.83333333,
97.58333333])
hdb_price_df['remaining_lease'].nunique()
661
remaining_lease_13_23 = hdb_price_df_13_23[['remaining_lease','adjusted_price']]
print(remaining_lease_13_23['adjusted_price'].min(),remaining_lease_13_23['adjusted_price'].max())
139610.48674198985 1302877.7242824065
remaining_lease_13_23
| remaining_lease | adjusted_price | |
|---|---|---|
| 0 | 74.333333 | NaN |
| 1 | 74.583333 | 764484.737287 |
| 2 | 74.666667 | 683135.212105 |
| 3 | 75.083333 | 809783.658934 |
| 4 | 75.583333 | 579285.515947 |
| ... | ... | ... |
| 217186 | 59.000000 | 351606.689699 |
| 217187 | 55.000000 | 355865.392536 |
| 217188 | 66.000000 | 339813.823214 |
| 217189 | 81.000000 | 779965.088238 |
| 217190 | 70.000000 | 482057.867327 |
213971 rows × 2 columns
remaining_lease_13_23.nunique()
remaining_lease 658 adjusted_price 54180 dtype: int64
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)
# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)
# 2004 to 2013
p=sns.regplot(x='remaining_lease', y='adjusted_price', data=remaining_lease_13_23, color='#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})
# Label x-axis and x ticks
ax.set_xlabel("Remaining Lease (in Years)", fontsize=14, labelpad=10)
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)
# Reformat y-axis
p.set_ylabel('Resale Price (SGD$)', size=15)
ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
p.set_yticklabels(ylabels)
p.set_ylim([50000,1100000])
# Remove the spines and make right thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)
# Add in line and rectangle on top
ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Remaining Lease Left and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Scatterplot and Regression Line — How (if any) HDB Resale Price is associated with Remaining Number of Years of Lease", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)
plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/1070124517.py:17: UserWarning: FixedFormatter should only be used together with FixedLocator p.set_yticklabels(ylabels)
Observations:
Newer HDB Flats (i.e. longer remaining lease) are associated with higher Resale Prices.
Comparing across both 10-year periods:
Not much visible difference between the steepness of graphs showing the relationship between Remaining Lease and HDB Resale Prices across both decades.
</font>
The table below summarizes the locational features of a flat that we will be conducting EDA with —
| No. | Feature Name | Values |
| :---| :----: | :---: |
| 1 | mrt_count_within_1km | 1, 2, 3, ...|
| 2 | mall_count_within_1km | 1, 2, 3, ...|
| 3 | good_sch_count_within_1km | 1, 2, 3, ...|
| 4 | shortest_dist_mrt | 1, 2, 3, ...|
| 5 | shortest_dist_mall | 1, 2, 3, ...|
| 6 | good_sch_score | 1, 2, 3, ...|
| 7 | distance_to_cbd_km | 1, 2, 3, ...|
| 8 | matured | 0 or 1|
Amenities are defined as MRT Stations, Shopping Malls and Good Schools.
import re
fig, ax = plt.subplots(figsize=(16,8), dpi=96)
# Number of MRT stations within 1 km
ax1 = plt.subplot(131)
a=sns.regplot(x='mrt_count_within_1km', y='adjusted_price', data=hdb_price_df_13_23, color='#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})
ylabels = ['{:,.0f}'.format(x) + 'K' for x in a.get_yticks()/1000]
ax1.set_yticklabels(ylabels)
ax1.set_ylabel('Resale Price (SGD$)', size=15)
ax1.set_xlabel('Number of MRT Stations', size=15)
# Number of Malls within 1 km
ax2 = plt.subplot(132)
b=sns.regplot(x='mall_count_within_1km', y='adjusted_price', data=hdb_price_df_13_23, color = '#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})
ax2.set_yticklabels(ylabels)
ax2.set_ylabel('', size=15)
ax2.set_xlabel('Number of Malls', size=15)
# Number of Good Schools within 1 km
ax3 = plt.subplot(133)
c=sns.regplot(x='good_sch_count_within_1km', y='adjusted_price', data=hdb_price_df_13_23, color = '#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})
ax3.set_yticklabels(ylabels)
ax3.set_ylabel('', size=15)
ax3.set_xlabel('Number of Good Schools', size=15)
# Add in line and rectangle on top
ax1.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax1.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
ax1.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Number of Amenities within 1km and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax1.text(x=0.05, y=.90, s="Scatterplot & Regression Line — How (if any) HDB Resale Price is associated with Number of Nearby Amenities like MRT Stations, Shopping Malls and Good Schools", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/1145574367.py:9: UserWarning: FixedFormatter should only be used together with FixedLocator ax1.set_yticklabels(ylabels) /var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/1145574367.py:16: UserWarning: FixedFormatter should only be used together with FixedLocator ax2.set_yticklabels(ylabels) /var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/1145574367.py:23: UserWarning: FixedFormatter should only be used together with FixedLocator ax3.set_yticklabels(ylabels)
Observations:
(Relative Comparision)
Number of MRT Stations within 1km vs HDB Resale Price:
Number of Malls within 1km vs HDB Resale Price:
Number of Good Schools within 1km vs HDB Resale Price:
Comparing across both 10-year periods:
We can see a trend that having more MRT Stations that are nearby HDB Flats, are starting to get significantly more popular in the recent decade as shown in the increase in steepness of the regression line (2013 to 2023 graph). We cannot really say the same for the number of nearby malls and good schools.
</font>
print(hdb_price_df_13_23['shortest_dist_mrt'].min(), hdb_price_df_13_23['shortest_dist_mrt'].max())
0.0360827902285726 4.080782752857259
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)
# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)
p=sns.regplot(x='shortest_dist_mrt', y='adjusted_price', data=hdb_price_df_13_23, color='#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})
# Label x-axis and x ticks
ax.set_xlabel("Shortest Distance to nearest MRT (km)", fontsize=14, labelpad=10)
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)
# Reformat y-axis
p.set_ylabel('Resale Price SGD ($)', size=15)
ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
p.set_yticklabels(ylabels)
p.set_ylim([50000,1200000])
# Remove the spines and Make right spine thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)
ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Distance to Nearest MRT and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Scatterplot & Regression Line — how HDB resale prices vary with distance to nearest MRT", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)
plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/3737170104.py:16: UserWarning: FixedFormatter should only be used together with FixedLocator p.set_yticklabels(ylabels)
Observations:
General trend is that the further away the HDB is from the nearest MRT, it is associated with a decrease in its resale price.
Surprisingly, some HDBs are more than 4km away from the nearest MRT have a relatively high resale price. Let's see which town(s) these HDBs belong to
Comparing across both 10-year periods:
It is safe to say that all HDBs are within about 4km to the nearest MRT Station in the recent decade, unlike in 2003 to 2012, where there are numerous HDB flats that are as far as more than 7km away from the nearest MRT Station. </font>
towns_more_than_4km_list = hdb_price_df_13_23[hdb_price_df_13_23['shortest_dist_mrt'] > 4]['town'].unique()
towns_more_than_4km = ''.join([str(town) for town in towns_more_than_4km_list])
print('Town(s) with HDBs that are more than 4km away from the nearest MRT Station:\n',towns_more_than_4km)
Town(s) with HDBs that are more than 4km away from the nearest MRT Station: MARINE PARADE
Although Marine Parade is still the furthest away from its nearest MRT Station, at least it is not 7km, like 20 years ago.
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)
# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)
p=sns.regplot(x='shortest_dist_mall', y='adjusted_price', data=hdb_price_df_13_23, color='#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})
# Label x-axis and x ticks
ax.set_xlabel("Distance to Nearest Shopping Mall (km)", fontsize=14, labelpad=10)
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)
# Reformat y-axis
p.set_ylabel('Resale Price (SGD$)', size=15)
ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
p.set_yticklabels(ylabels)
p.set_ylim([50000,1200000])
# Remove the spines and Make right spine thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)
ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Distance to Nearest Shopping Mall and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Scatterplot & Regression Line — how HDB resale prices vary with distance to nearest Shopping Mall", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)
plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/141229708.py:17: UserWarning: FixedFormatter should only be used together with FixedLocator p.set_yticklabels(ylabels)
Observations:
There is a relatively strong and negative association between Distance to Nearest Shopping Mall and HDB Resale Prices. In other words, HDB Flats that are further away from their Nearest Shopping Mall are generally associated with a lower Resale Price.
Comparing across both 10-year periods:
Not much visible difference between the steepness of graphs showing the relationship between Distance to Nearest Shopping Mall and HDB Resale Prices across both decades.
However, there seems to be HDB Flats that are STILL more than 6km away from the Nearest Shopping Mall that have relatively high resale prices. Let's see which town(s) these HDBs belong to and what are the exact HDBs features. </font>
hdb_more_than_6km_from_nearest_mall = hdb_price_df_13_23[hdb_price_df_13_23['shortest_dist_mall'] > 6]
hdb_more_than_6km_from_nearest_mall
| month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | remaining_lease | ... | good_sch_list_within_1km | good_sch_score | good_sch_score_sap_gep | distance_to_cbd_km | matured | year | CPI | adjusted_price | flat_type_year | blocknum | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 69221 | 2014-09-01 | MARINE PARADE | 5 ROOM | 27 | MARINE CRES | 07 TO 09 | 126.0 | Standard | 1975 | 60.0 | ... | [] | 0 | 0 | 23.277334 | 1 | 2014 | 99.553 | 813636.957199 | 5 ROOM - 2014 | 27 |
| 69222 | 2013-06-01 | MARINE PARADE | 5 ROOM | 27 | MARINE CRES | 04 TO 06 | 128.0 | Standard | 1975 | 61.0 | ... | [] | 0 | 0 | 23.277334 | 1 | 2013 | 97.656 | 832399.442943 | 5 ROOM - 2013 | 27 |
2 rows × 35 columns
towns_more_than_6km_list = hdb_more_than_6km_from_nearest_mall['town'].unique()
towns_more_than_6km = ''.join([str(town) for town in towns_more_than_6km_list])
street_name_more_than_6km_list = hdb_more_than_6km_from_nearest_mall['street_name'].unique()
street_name_more_than_6km = ''.join([str(street_name) for street_name in street_name_more_than_6km_list])
print('Town(s) and corresponding Street Name(s) of HDBs that are more than 6km away from the nearest Shopping Mall:\n Town:',towns_more_than_6km,'\nStreet Name:',street_name_more_than_6km )
Town(s) and corresponding Street Name(s) of HDBs that are more than 6km away from the nearest Shopping Mall: Town: MARINE PARADE Street Name: MARINE CRES
hdb_price_df['good_sch_score'].unique()
array([3, 4, 1, 0, 2])
fig, ax = plt.subplots(figsize=(12, 14), dpi=96)
# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)
p=sns.violinplot(x='good_sch_score',
y='adjusted_price',
data=hdb_price_df_13_23,
width=1,
palette='Set3',
order=hdb_price_df_13_23.groupby('good_sch_score')['adjusted_price'].median().reset_index()['good_sch_score'].tolist())
p.set_xticklabels(p.get_xticklabels(), rotation=0, ha='right')
p.set_xlabel('Good Schools Score', size=15)
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)
ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
p.set_yticklabels(ylabels)
p.set_ylabel('Resale Price (SGD$)', size=15)
# Remove the spines and make right spine thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)
# Add in line and rectangle on top
ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Good School Score and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=18, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Violin Boxplot — How HDB resale prices vary with different Good Schools Score", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/3215674227.py:20: UserWarning: FixedFormatter should only be used together with FixedLocator p.set_yticklabels(ylabels)
Observations:
The scoring of good schools within 1 km of HDB Flat does not seem to be correlated with its resale price, as seen by the similar distribution of resale prices in the violin plot and corresponding median resale prices.
Comparing across both 10-year periods:
Not much visible difference between the distributions of HDB resale prices for the different Good Schools Score. </font>
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)
# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)
# 2004 to 2013
p=sns.regplot(x='distance_to_cbd_km', y='adjusted_price', data=hdb_price_df_13_23, color='#e69798', scatter_kws={"s": 1, 'alpha':0.5, 'color':'#193441'})
# Label x-axis and x ticks
ax.set_xlabel("Distance to CBD (in km)", fontsize=14, labelpad=10)
ax.xaxis.set_tick_params(pad=2, labelbottom=True, bottom=True, labelsize=12, labelrotation=0)
# Reformat y-axis
p.set_ylabel('Resale Price SGD ($)', size=15)
ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
p.set_yticklabels(ylabels)
p.set_ylim([50000,1300000])
# Remove the spines
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)
# Add in line and rectangle on top
ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Distance to CBD and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=18, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Scatterplot & Regression Line — How HDB resale prices vary with distance to Central Business District (CBD)", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
# Adjust the margins around the plot area
plt.subplots_adjust(left=None, bottom=0.2, right=None, top=0.9, wspace=None, hspace=None)
plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/2485396176.py:17: UserWarning: FixedFormatter should only be used together with FixedLocator p.set_yticklabels(ylabels)
Observations:
Strong and negative correlation between Distance to CBD and HDB Resale Prices. The further the distance of HDB Flats from the Central Business District, there is an association that resale prices are lower and vice-versa.
Comparing across both 10-year periods:
In the recent decade (i.e. from 2013 to 2023), there is a stronger negative correlation between Distance to CBD and HDB Resale Prices compared to the earlier decade (i.e. 2003 to 2012); the steeper gradient in the recent decade implies that HDB Flats that are further away from CBD are associated with a more than proportionate dip in its prices. This suggests the likelihood of changing preferences in HDB owners' when it comes to the proximity of the HDB Flats and Central Business District. </font>
hdb_price_df['matured'].unique()
array([1, 0])
fig, ax = plt.subplots(figsize=(13.33, 13), dpi=96)
# Grid
ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5, zorder=1)
ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5, zorder=1)
p = sns.boxplot(x='matured', y='adjusted_price', data = hdb_price_df_13_23, palette=['#e69798', '#e69798'])
p.set_xticklabels(['Not Matured', 'Matured'], size = 15)
p.set_xlabel('', size=15)
ylabels = ['{:,.0f}'.format(x) + 'K' for x in p.get_yticks()/1000]
p.set_yticklabels(ylabels)
p.set_ylabel('HDB Resale Price (SGD$)')
p.set_ylim([0,1400000])
# Remove the spines and make right thicker
ax.spines[['top','left','bottom']].set_visible(False)
ax.spines['right'].set_linewidth(1.1)
ax.plot([0.05, .9], [.98, .98], transform=fig.transFigure, clip_on=False, color='#193441', linewidth=.6)
ax.add_patch(plt.Rectangle((0.05,.98), 0.04, -0.02, facecolor='#193441', transform=fig.transFigure, clip_on=False, linewidth = 0))
# Add in title and subtitle
ax.text(x=0.05, y=.93, s="2013 to 2023: Relationship between Maturity of Estate and HDB Resale Prices", transform=fig.transFigure, ha='left', fontsize=14, weight='bold', alpha=.8)
ax.text(x=0.05, y=.91, s="Boxplot — Investigate whether or not HDB Flats in Mature Estates are associated with its Resale Prices", transform=fig.transFigure, ha='left', fontsize=12, alpha=.8)
plt.show()
/var/folders/dr/gs9wbbgs2zlcfyqybn5vvrw40000gn/T/ipykernel_69635/3590593329.py:12: UserWarning: FixedFormatter should only be used together with FixedLocator p.set_yticklabels(ylabels)
Observations:
Not much visible association between maturity of estates and resale prices. However, the interquartile range of HDB Resale Prices for Matured Estates is larger than that of Non-Matured Estates.
Comparing across both 10-year periods:
In the recent decade (i.e. from 2013 to 2023), Matured Estates tend to fetch a slightly higher Median HDB Resale Prices compared to Non-Matured Estates while Median Prices are more or less the same regardless of the Estate's Maturity for the previous decade (i.e. 2003 to 2012). </font>